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Abstract 

We analyze some voting models mimicking online evaluation systems intended to 
reduce the information overload. The minimum number of operations needed for a 
system to be effective is analytically estimated. When herding effects are present, 
linear preferential attachment marks a transition between trustful and biased rep- 
utations. 
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1 The General Problem 



Electronic communities of all kinds, transaction systems in particular, bring 
together users otherwise unknown to each other. In order for expressed opin- 
ions to be trusted or for transactions to materialize, rating systems are often 
crucial to rely upon [1,2]. They collect information about participants' past 
behavior, aggregate them and display the result. Such systems become most 
effective if posted ratings are weighted according to raters' reputation, the 
prototype application of this feedback technique being Epinions. 

Previous work [3,4,5] concentrates on optimal reputation systems, assuming 
that users pick the object to be judged from a uniform distribution and that 
the process converges to the real values, without bothering about the amount 
of resources spent. In reality we have to face a limited amount of evaluations 
and biased probability distributions of the number of votes. 

Here we shall consider a system of N objects endowed with an intrinsic quality 
qi, i = 1,2,...,N. At each time step t, a user evaluates (or ranks) k objects 
drawn from a probability distribution p(t). We shall analyze different situa- 
tions of increasing complexity within this framework, trying to find out under 
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what circumstances the original values q = [<?i , <?2 , • • • , <27v] (or their ordering) 
can be reconstructed from the aggregation of the evaluations with non zero 
probability for A — > 00. 



2 Random evaluations 

In this section we consider the case where objects to be evaluated are chosen 
from a uniform distribution Pi(t) = 1/N. 

Occupancy problem. As a first example we fix k = 1 and ask what is the 
minimum number of users t\ needed to evaluate every object at least once 
with a given probability P > 0. The probability that after t steps all objects 
have been evaluated reads [6] : 

where A ~ Ne~~^ is the expected number of unjudged objects. The above 
limit becomes greater than zero as soon as the number of evaluators t reaches 
the value 

A 

ti-AHogy. (1) 

Pairwise comparisons. Let us now fix k = 2 and assume that players can 
just compare (not rate) one randomly chosen couple at each step. We call 
Ti the i th ranked agent according to her intrinsic value If raters make no 
mistakes, the real rank is recovered once the A — 1 paired comparisons Gn-i = 
[(fi, T2), (?"2, r$), (rjv-i, rw)] have been performed. The probability to pick 
an element belonging to Gn-i is 2/A, and from (1) one needs doing so at least 
A log A times. The minimal number of steps needed to find f = [r 1 ,r 2 , ...,rjy] 
with finite probability is then 

^~A 2 log(cA), (2) 
where c is a constant depending on P. 

We should remark here that random comparisons are not at all a good so- 
lution for sorting a vector of scalars: finding optimal sorting algorithms is a 
classic problem of computer science [7]. In real situations, though, one does 
not always have control over the evaluators' choice. Besides, evaluators make 
mistakes and the outcome of a comparison is often uncertain. In this 
common procedure is to perform round robins with successive eliminations, 
obtaining equation (2) for the minimum number of comparisons needed to find 
the winner [8]. 
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Ranking a group of peers. Let us now consider a self evaluating community, 
a group of N users sharing the same expertise and voting for one another. This 
is equivalent to fixing t — N in our general model. The N agents pick each k 
randomly chosen peers, and establish a partial ranking among them according 
to their intrinsic values. We want to find, in the limit of large N, the minimal 
average value k r of k that allows to recover the "God-given" ordering with 
probability P. 

When k <C N this model can be mapped into the previous one. In fact making 
a local ordering of k elements would require about k 2 paired comparisons. We 
then need Nk 2 ~ N 2 log N comparisons to find the entire ranking. Thus 



k r ~ y/N\og(cN). (3) 
In figure 1 this last result is shown to match well simulation data. 
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Fig. 1. Minimal average number of comparisons per agent k r needed to achieve the 
correct rank with probability P = 0.7 as a function of N. Circles are simulation 
results averaged over 100 realizations, the solid line is the analytical estimation (3). 
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3 Preferential attachment 



In real electronic communities fame plays an important role. Objects that 
are already popular are more likely to receive comments, books mentioned by 
important media are more likely to be reviewed, people about whom many 
talk about are more likely to be judged again, and so on. Such "richer get 
richer" phenomenon, often referred to as preferential attachment [9], appears 
in numerous empirical data of social systems [10] and is thought to be one 
of the factors responsible for the emergence of scale free networks. A direct 
consequence of this mechanism is that attendance is not a fair indicator of 
web pages' quality [11]. 

Reputation systems collect evaluations about objects, aggregate and release 
them. The aggregation process often consists in computing an average of the 
received votes for each object. Assuming that objects have an intrinsic quality 
qi and that evaluations are random variables with mean equal to q iy the law 
of large numbers ensures that computed reputations tend to the real intrinsic 
values once the number of users becomes large. The implicit assumption is that 
every object receives a growing number of evaluations, which is not always the 
case in the presence of some sort of preferential attachment. 

This can be verified within the framework of the occupancy problem of section 
2: a user per time step t evaluates k — 1 objects among the N available. Instead 
of drawing the objects to be evaluated from a uniform distribution, let us now 
define 

*M - -g®-, (4) 

E Vf(t) 

i=i 

where Vi(t) is the number of evaluations received by object % up to time t, with 
V{(0) = IVi. The parameter a sets the strength of the preferential attachment. 
In the limit a — > oo one has Pi(t) — > 6(i — i*), where i* = argmaXj{i>j(£)}); if 
a = 0, on the other hand, pi becomes uniform. 

As the real rank is not reached with certainty, it is useful to define a distance 
on the rank space. Let 

«rffl = ^» (5) 
be the distance between two ranks, where 

4r[i],r[2] )= , mmm 

VE<V?[l]E i V?[2] 
is the rank correlation coefficient [12]. 
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Fig. 2. Time dependence of the distance (5) between the "God given rank" tq and 
the rank r produced by the model with preferential attachment (4). Symbols report, 
in log- linear scale, results of a simulation with N = 500 agents and a = 2 (circles), 
a = 1 (plus signs) and a = (diamonds). 

In fig. 2 the time dependence of the distance between the "God given rank" 
and the best guess r, as defined in (5), is reported. Numerical simulations show 
that for a > 1 the distance reaches a plateau > 0, meaning that one can never 
achieve an arbitrary good estimate of qualities q because some agents never 
get to be evaluated. For a < 1, on the other hand, the real rank is recovered 
in a finite number of time steps; at the transition point a — 1 one needs an 
infinite amount of steps to evaluate all objects. 

In fact we can explain such a behavior with the following argument. The 
transition appears when the number of visits Vi(t) does not grow in time for 
some is. For most objects the number of visits still grows linearly in time 

Vi(t) ~ fit (6) 

close to the transition. We can thus approximate the denominator of (4) 

N N 

J2 vf(t) ~ Nt a I(a), where 1(a) = J2 ft- The probability for a given object 

j=\ i=l 

t 

i never to receive an evaluation up to time t is Qi(t) — n (1 —Pi(t')), the log 

t'=0 
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t 

of which, \og(Qi) ~ — N h a] J2 only converges to a finite value for a > 1: 

^ ' s=l 

the transition value is indeed a — 1. 

One can generalize the preferential attachment rule (4) by introducing multi- 
plicative intrinsic qualities: high quality agents have more chance to be evalu- 
ated, similarly to the model proposed in ref. [13]. Let us define the probability 
to be evaluated as follows: 

*t) = (7) 
£ q,vf(t) 

N 

We can again approximate the denominator of (7) as 1j v ?(t) — Nt a I(a), 

+oo 

where 1(a) = J p(q)qf a (q)dq and p is the probability distribution of the 

— oo 

quality vector q. The same transition at a = 1 is found and equation (6) 
yields 



(8) 
(9) 
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Numerical simulations are shown to agree with this approximation in figure 
3. The left panel displays the q dependence of / for fixed a: equation (8) 
asymptotically matches the simulation data. In the right panel of figure 3 
the ratio 7 between the logarithm of the rate of visits Vi/t and that of the 
intrinsic quality g« is plotted against a. Equation (9) fits the data for a < a c 
and diverges at a = 1 as expected. The simulations were carried on with a 
uniform p(q). 

The distribution of the number of visits becomes a power law around the 
transition point a — 1, as reported in figure 4. 

The results described in this section can be easily extended to the case of 
pairwise comparisons introduced in section 2. That is, agents compare 2 ob- 
jects per time step, chosen according to equation (7), and rank them in order 
of decreasing quality. The global ranking is finally established on the basis of 
these partial orderings. The question is to find the maximal value of a that 
allows to recover the true rank with finite probability. This can be done by 
evaluating the probability that two agents (belonging to the set G^-i) will 
never be compared. By very similar reasoning to the absolute judgment case, 
one obtains a c — \. 
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Fig. 3. (Left graph) Average rate of visits / to an agent plotted against her intrinsic 
quality q, after 10 5 time steps, with a = 0.5. (Right graph) Ratio 7 between the 
logarithm of the rate of visits Vi/t and that of the intrinsic quality qi plotted against 
a. Circles are simulation data, lines are calculated with equation 8. Other simulatin 
parameters: number of agents N = 500, number of realizations= 10. Qualities are 
uniformly distributed. 
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Fig. 4. Probability distribution P(v) of the number of visits. Circles are simulation 
data of the model characterized by equation (7) with a = 1 and N = 500, after 10 5 
steps. Here qualities are uniformly spaced, i.e. % = i. The solid line is a power law 
with exponent —1.3. 
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